智能论文笔记

A Comparative Study on COVID-19 Fake News Detection Using Different Transformer Based Models

Sajib Kumar Saha Joy , Dibyo Fabian Dofadar , Riyo Hayat Khan , Md. Sabbir Ahmed , Rafeed Rahman

分类：自然语言处理 | 机器学习

2022-08-02

社交网络的快速发展以及互联网可用性的便利性加剧了虚假新闻和社交媒体网站上的谣言的泛滥。在共同19的流行病中，这种误导性信息通过使人们的身心生命处于危险之中，从而加剧了这种情况。为了限制这种不准确性的传播，从在线平台上确定虚假新闻可能是第一步。在这项研究中，作者通过实施了五个基于变压器的模型，例如Bert，Bert没有LSTM，Albert，Roberta和Bert＆Albert的混合体，以检测Internet的Covid 19欺诈新闻。Covid 19假新闻数据集已用于培训和测试模型。在所有这些模型中，Roberta模型的性能优于其他模型，通过在真实和虚假类中获得0.98的F1分数。

translated by 谷歌翻译

Restormer: Efficient Transformer for High-Resolution Image Restoration

Syed Waqas Zamir , Aditya Arora , Salman Khan , Munawar Hayat , Fahad Shahbaz Khan , Ming-Hsuan Yang

分类：计算机视觉

2021-11-18

由于卷积神经网络（CNNS）在从大规模数据中进行了学习的可概括图像前沿执行井，因此这些模型已被广泛地应用于图像恢复和相关任务。最近，另一类神经架构，变形金刚表现出对自然语言和高级视觉任务的显着性能。虽然变压器模型减轻了CNNS的缺点（即，有限的接收领域并对输入内容而无关），但其计算复杂性以空间分辨率二次大转，因此可以对涉及高分辨率图像的大多数图像恢复任务应用得不可行。在这项工作中，我们通过在构建块（多头关注和前锋网络）中进行多个关键设计，提出了一种有效的变压器模型，使得它可以捕获远程像素相互作用，同时仍然适用于大图像。我们的模型，命名恢复变压器（RESTORMER），实现了最先进的结果，导致几种图像恢复任务，包括图像派生，单图像运动脱棕，散焦去纹（单图像和双像素数据）和图像去噪（高斯灰度/颜色去噪，真实的图像去噪）。源代码和预先训练的型号可在https://github.com/swz30/restormer上获得。

translated by 谷歌翻译

Intriguing Properties of Vision Transformers

Muzammal Naseer , Kanchana Ranasinghe , Salman Khan , Munawar Hayat , Fahad Shahbaz Khan , Ming-Hsuan Yang

分类：计算机视觉 | 人工智能 | 机器学习

2021-05-21

视觉变压器（VIT）在各种机器视觉问题上表现出令人印象深刻的性能。这些模型基于多头自我关注机制，可以灵活地参加一系列图像修补程序以编码上下文提示。一个重要问题是在给定贴片上参加图像范围内的上下文的这种灵活性是如何促进在自然图像中处理滋扰，例如，严重的闭塞，域移位，空间置换，对抗和天然扰动。我们通过广泛的一组实验来系统地研究了这个问题，包括三个vit家族和具有高性能卷积神经网络（CNN）的比较。我们展示和分析了vit的以下迷恋性质：（a）变压器对严重闭塞，扰动和域移位高度稳健，例如，即使在随机堵塞80％的图像之后，也可以在想象中保持高达60％的前1个精度。内容。（b）与局部纹理的偏置有抗闭锁的强大性能，与CNN相比，VITS对纹理的偏置显着偏差。当受到适当训练以编码基于形状的特征时，VITS展示与人类视觉系统相当的形状识别能力，以前在文献中无与伦比。（c）使用VIT来编码形状表示导致准确的语义分割而没有像素级监控的有趣后果。（d）可以组合从单VIT模型的现成功能，以创建一个功能集合，导致传统和几枪学习范例的一系列分类数据集中的高精度率。我们显示VIT的有效特征是由于自我关注机制可以实现灵活和动态的接受领域。

translated by 谷歌翻译

Transformers in Vision: A Survey

Salman Khan , Muzammal Naseer , Munawar Hayat , Syed Waqas Zamir , Fahad Shahbaz Khan , Mubarak Shah

分类：

2021-01-04

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets. These strengths have led to exciting progress on a number of vision tasks using Transformer networks. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. We start with an introduction to fundamental concepts behind the success of Transformers i.e., self-attention, large-scale pre-training, and bidirectional feature encoding. We then cover extensive applications of transformers in vision including popular recognition tasks (e.g., image classification, object detection, action recognition, and segmentation), generative modeling, multi-modal tasks (e.g., visual-question answering, visual reasoning, and visual grounding), video processing (e.g., activity recognition, video forecasting), low-level vision (e.g., image super-resolution, image enhancement, and colorization) and 3D analysis (e.g., point cloud classification and segmentation). We compare the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value. Finally, we provide an analysis on open research directions and possible future works. We hope this effort will ignite further interest in the community to solve current challenges towards the application of transformer models in computer vision.

translated by 谷歌翻译

Stylized Adversarial Defense

Muzammal Naseer , Salman Khan , Munawar Hayat , Fahad Shahbaz Khan , Fatih Porikli

分类：计算机视觉

2020-07-29

深度卷积神经网络（CNN）很容易被输入图像的细微，不可察觉的变化所欺骗。为了解决此漏洞，对抗训练会创建扰动模式，并将其包括在培训设置中以鲁棒性化模型。与仅使用阶级有限信息的现有对抗训练方法（例如，使用交叉渗透损失）相反，我们建议利用功能空间中的其他信息来促进更强的对手，这些信息又用于学习强大的模型。具体来说，我们将使用另一类的目标样本的样式和内容信息以及其班级边界信息来创建对抗性扰动。我们以深入监督的方式应用了我们提出的多任务目标，从而提取了多尺度特征知识，以创建最大程度地分开对手。随后，我们提出了一种最大边缘对抗训练方法，该方法可最大程度地减少源图像与其对手之间的距离，并最大程度地提高对手和目标图像之间的距离。与最先进的防御能力相比，我们的对抗训练方法表明了强大的鲁棒性，可以很好地推广到自然发生的损坏和数据分配变化，并保留了清洁示例的模型准确性。

translated by 谷歌翻译

Towards Robust and Reproducible Active Learning Using Neural Networks

Prateek Munjal , Nasir Hayat , Munawar Hayat , Jamshid Sourati , Shadab Khan

分类：机器学习 | 计算机视觉 | (统计)机器学习

2020-02-21

主动学习（AL）是一个有希望的ML范式，有可能解析大型未标记数据并有助于降低标记数据可能令人难以置信的域中的注释成本。最近提出的基于神经网络的AL方法使用不同的启发式方法来实现这一目标。在这项研究中，我们证明，在相同的实验环境下，不同类型的AL算法（基于不确定性，基于多样性和委员会）产生了与随机采样基线相比的不一致增长。通过各种实验，控制了随机性来源，我们表明，AL算法实现的性能指标方差可能会导致与先前报道的结果不符的结果。我们还发现，在强烈的正则化下，AL方法在各种实验条件下显示出比随机采样基线的边缘或没有优势。最后，我们以一系列建议进行结论，以了解如何使用新的AL算法评估结果，以确保在实验条件下的变化下结果可再现和健壮。我们共享我们的代码以促进AL评估。我们认为，我们的发现和建议将有助于使用神经网络在AL中进行可重复的研究。我们通过https://github.com/prateekmunjal/torchal开源代码

translated by 谷歌翻译

Floods Relevancy and Identification of Location from Twitter Posts using NLP Techniques

Muhammad Suleman , Muhammad Asif , Tayyab Zamir , Ayaz Mehmood , Jebran Khan , Nasir Ahmad , Kashif Ahmad

分类：自然语言处理

2023-01-01

This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of location information from the text. For RCTP, we proposed four different solutions based on BERT, RoBERTa, Distil BERT, and ALBERT obtaining an F1-score of 0.7934, 0.7970, 0.7613, and 0.7924, respectively. For LETT, we used three models namely BERT, RoBERTa, and Distil BERTA obtaining an F1-score of 0.6256, 0.6744, and 0.6723, respectively.

translated by 谷歌翻译

Skeletal Video Anomaly Detection using Deep Learning: Survey, Challenges and Future Directions

Pratik K. Mishra , Alex Mihailidis , Shehroz S. Khan

分类：计算机视觉

2022-12-31

The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.

translated by 谷歌翻译

Guidance Through Surrogate: Towards a Generic Diagnostic Attack

Muzammal Naseer , Salman Khan , Fatih Porikli , Fahad Shahbaz Khan

分类：机器学习 | 人工智能 | 计算机视觉

2022-12-30

Adversarial training is an effective approach to make deep neural networks robust against adversarial attacks. Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD. High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as `gradient masking'. In this work, we analyse the effect of label smoothing on adversarial training as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA). Our attack approach is based on a `match and deceive' loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate for the case of Auto-Attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.

translated by 谷歌翻译

Blind Restoration of Real-World Audio by 1D Operational GANs

Turker Ince , Serkan Kiranyaz , Ozer Can Devecioglu , Muhammad Salman Khan , Muhammad Chowdhury , Moncef Gabbouj

分类：机器学习

2022-12-30

Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a blend of artifacts such as reverberation, sensor noise, and background audio mixture with varying types, severities, and duration. In this study, we propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) with temporal and spectral objective metrics to enhance the quality of restored audio signal regardless of the type and severity of each artifact corrupting it. Methods: 1D Operational-GANs are used with generative neuron model optimized for blind restoration of any corrupted audio signal. Results: The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with a random blend of artifacts each with a random severity to mimic real-world audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods. Significance: This is a pioneer study in blind audio restoration with the unique capability of direct (time-domain) restoration of real-world audio whilst achieving an unprecedented level of performance for a wide SDR range and artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally effective real-world audio restoration with significantly improved performance. The source codes and the generated real-world audio datasets are shared publicly with the research community in a dedicated GitHub repository1.

translated by 谷歌翻译